\chapter{GSI Community Tools}\label{gsi_tool}
 
%-------------------------------------------------------------------------------
\section{BUFR Format and BUFR Tools}
%-------------------------------------------------------------------------------

Under \textit{./util/bufr\_tools}, there are many Fortran examples to illustrate basic BUFR/PrepBUFR file process skills such as encoding, decoding, and appending. For details of these examples and the BUFR format, please see the BUFR/PrepBUFR User\textquotesingle s Guide, which is freely available on-line
\begin{scriptsize}
\begin{verbatim}
http://www.dtcenter.org/com-GSI/BUFR/docs/index.php
\end{verbatim}
\end{scriptsize}
The observation BUFR files generated by NCEP (for example, PrepBUFR and BUFR files from NCEP ftp server or gdas1.t12z.prepbufr.nr in tutorial case) are in Big Endian binary format. For release version older than 3.2, the BUFR files have to be converted from  Big Endian BUFR file to Little Endian file when used by the GSI on Linux platform. Please refer to older version User\textquotesingle s guide on how to convert.

Since release 3.2, BUFRLIB can automatically identify and handle either byte orders. For Intel and PGI compilers on Linux, the Big Endian BUFR/PrepBUFR files can be used by GSI without byte swap.

%-------------------------------------------------------------------------------
\section{Read GSI Diagnostic Files}
%-------------------------------------------------------------------------------

Lots of useful information about how one observation was used in the analysis such as innovation, observation values, observation error and adjusted observation error, and quality control information, has been saved in diagnostic files. To generate these diagnosis files, namelist variable \textit{write\_diag} in namelist section \textit{\&SETUP} needs to be true. The \textit{write\_diag} variable has been introduced in Part 4 of Section 3.4. The following is an example of using the \textit{write\_diag} variable to control diagnostic files. When we set the number of outer loops to 2, and set the \textit{write\_diag} namelist variable to the following:

\begin{scriptsize}
\begin{verbatim}
write_diag(1)=.true.,write_diag(2)=.false.,write_diag(3)=.true.,
\end{verbatim}
\end{scriptsize}

GSI will write out diagnostic files before the start of the 1st outer loop start (O-B) and after the completion of the 2nd outer loop finish (O-A). We don\textquotesingle t want GSI to write out diagnosis files after the 1st outer loop because we set \textit{write\_diag(2)=.false.} \\

This is what we set in our example case described in section 5.2. From this case, we can see the following diagnostic files generated from the GSI analysis:
\begin{scriptsize}
\begin{verbatim}
diag_amsua_metop-a_anl.2014061700  diag_amsua_n18_ges.2014061700
diag_amsua_metop-a_ges.2014061700  diag_conv_anl.2014061700
diag_amsua_n15_anl.2014061700      diag_conv_ges.2014061700
diag_amsua_n15_ges.2014061700      diag_hirs4_metop-a_anl.2014061700
diag_amsua_n18_anl.2014061700      diag_hirs4_metop-a_ges.2014061700
\end{verbatim}
\end{scriptsize}

All files are identified with a filename containing three elements.  The first element  "diag" indicates these are combined diagnostic files.  The second element identifies the observation type (here, "conv" means conventional observation from prepbufr and "amsua\_n15" corresponds to radiance observation AMSU-A from NOAA 15).  The last element identifies which step of outer loop the files were generated for.  Here, "anl" means the contents were written after the last outer loop (from \textit{write\_diag(3)=.true.}) and "ges" means the contents were written before the first output loop (from \textit{write\_diag(1)= .true.}). \\

To help users to read the information from these diagnostic files, we have provided two Fortran programs in the  \textit{./util/Analysis\_Utilities/read\_diag/} directory:

\begin{hangparas}{.4in}{1} 
\hspace{2ex} \textit{read\_diag\_conv.f90} : Reads the diagnostic files for conventional observations. For example: \textit{diag\_conv\_anl.2014061700} and \textit{diag\_conv\_ges.2014061700}

\hspace{2ex} \textit{read\_diag\_rad.f90} : Reads the diagnostic files for radiance observation. For example: 
 \end{hangparas}
 
\begin{scriptsize}
\begin{verbatim}
       diag_amsua_n15_ges.2014061700 diag_hirs4_metop-a_anl.2014061700
       diag_amsua_n18_anl.2014061700 diag_hirs4_metop-a_ges.2014061700
\end{verbatim}
\end{scriptsize}

To compile the programs, use the makefile provided: 
\begin{scriptsize}
\begin{verbatim}
./make 
\end{verbatim}
\end{scriptsize}

Note: since information in the GSI \textit{include} directory is required, the GSI must have been compiled first.

To run \textit{read\_diag\_conv.exe}, a namelist file \textit{namelist.conv} needs to be in the directory along with the executable. The namelist.conv only has two parameters:

\begin{scriptsize}
\begin{verbatim}
&iosetup
  infilename='./diag_conv_anl',     : The path and name of GSI diagnosis file
  outfilename='./results_conv_anl', : The path and name of a text file used to
 		          save the content of the diagnostic file
 /
\end{verbatim}
\end{scriptsize}

The user can set the test case directory and file \textit{diag\_conv\_anl.2014061700} from section 5.2 as the entry for \textit{infilename} in the namelist, then run the executable

\begin{scriptsize}
\begin{verbatim}
./read_diag_conv.exe
\end{verbatim}
\end{scriptsize}

The results are placed in the file specified by the outfilename entry in the namelist. In this case that would be a file \textit{results\_conv\_anl} located in the directory where the executable was run.

Similarly, to run \textit{read\_diag\_rad.exe}, the namelist file namelist.rad is needed. It contains the same parameters as \textit{namelist.conv} but it links to radiance diag files. After setting it to use the same case from section 5.2, such as:
\begin{scriptsize}
\begin{verbatim}
&iosetup
  infilename='(test directory)/diag_amsua_n18_ges.2014061700',
  outfilename='./results_amsua_n18_ges',
 /     
\end{verbatim}
\end{scriptsize}

Running the executable creates the text file \textit{results\_amsua\_n18\_ges} specified by the namelist in the directory \textit{read\_diag\_rad.exe} runs.\\

For the conventional observations, the data is stored in two arrays: \textit{rdiagbuf} and \textit{cdiagbuf}. Their contents are listed as follows, for temperature (check \textit{src/main/setupt.f90}) as an example:
\begin{scriptsize}
\begin{verbatim}
   cdiagbuf     = station id
   rdiagbuf(1)  = observation type
   rdiagbuf(2)  = observation subtype
   rdiagbuf(3)  = observation latitude (degrees)
   rdiagbuf(4)  = observation longitude (degrees)
   rdiagbuf(5)  = station elevation (meters)
   rdiagbuf(6)  = observation pressure (hPa)
   rdiagbuf(7)  = observation height (meters)
   rdiagbuf(8)  = observation time (hours relative to analysis time)
   rdiagbuf(9)  = input prepbufr qc or event mark
   rdiagbuf(10) = setup qc or event mark (currently qtflg only)
   rdiagbuf(11) = read_prepbufr data usage flag
   rdiagbuf(12) = analysis usage flag (1=use, -1=not used)
   rdiagbuf(13) = nonlinear qc relative weight
   rdiagbuf(14) = prepbufr inverse obs error (K**-1)
   rdiagbuf(15) = read_prepbufr inverse obs error (K**-1)
   rdiagbuf(16) = final inverse observation error (K**-1)
   rdiagbuf(17) = observation (K)
   rdiagbuf(18) = obs-ges used in analysis (K)
   rdiagbuf(19) = obs-ges without bias correction (K)
\end{verbatim}
\end{scriptsize}

For wind observations, the content after index 16  (check \textit{src/main/setupw.f90}) is:
\begin{scriptsize}
\begin{verbatim}
   rdiagbuf(17) = earth relative u wind component observation (m/s)
   rdiagbuf(18) = earth relative u obs-ges used in analysis (m/s)
   rdiagbuf(19) = earth relative u obs-ges w/o bias correction (m/s) 
   rdiagbuf(20) = earth relative v wind component observation (m/s)
   rdiagbuf(21) = earth relative v obs-ges used in analysis (m/s)
   rdiagbuf(22) = earth relative v obs-ges w/o bias correction (m/s) 
\end{verbatim}
\end{scriptsize}


The \textit{read\_diag\_conv.exe} reads these arrays and outputs important information in the text file \textit{results\_conv\_anl}specified by the user in the \textit{\&iosetup} namelist.  For example:

\begin{scriptsize}
\begin{verbatim}  
   station      obs       obs      obs        obs         obs       usag     obs       O-B
       ID       type     time   latitude   longitude    pressure            value              
ps @ 46047    : 180     -0.17     32.40     240.50     1014.30       1     1014.30   -0.09
 t @ 72293    : 120     -0.98     32.85     242.88      996.00       1      297.25    1.26
uv @ 72293    : 220     -0.98     32.85     242.88      996.00       1        3.90    0.38   3.30    2.66
\end{verbatim}
\end{scriptsize}
 
For wind, the last 4 columns are the wind components in the order of: U observation, O-B for U, V observation, O-B for V.

For radiance observations, the data is stored in two arrays: \textit{diagbuf} and \textit{diagbufchan}. Their contents are listed as follows (please refer to \textit{src/main/setuprad.f90} for more details):
\begin{scriptsize}
\begin{verbatim}  
  diagbuf(1)  = observation latitude (degrees)
  diagbuf(2)  = observation longitude (degrees)
  diagbuf(3)  = model (guess) elevation at observation location
  diagbuf(4)  = observation time (hours relative to analysis time)

  diagbuf(5)  = sensor scan position
  diagbuf(6)  = satellite zenith angle (degrees)
  diagbuf(7)  = satellite azimuth angle (degrees)
  diagbuf(8)  = solar zenith angle (degrees)
  diagbuf(9)  = solar azimuth angle (degrees)
  diagbuf(10) = sun glint angle (degrees) (sgagl)

  diagbuf(11) = surface fractional coverage by water
  diagbuf(12) = surface fractional coverage by land
  diagbuf(13) = surface fractional coverage by ice
  diagbuf(14) = surface fractional coverage by snow
if(.not. retrieval)then
  diagbuf(15) = surface temperature over water (K)
  diagbuf(16) = surface temperature over land (K)
  diagbuf(17) = surface temperature over ice (K)
  diagbuf(18) = surface temperature over snow (K)
  diagbuf(19) = soil temperature (K)
  diagbuf(20) = soil moisture
  diagbuf(21) = surface land type
else
  diagbuf(15) = SST first guess used for SST retrieval
  diagbuf(16) = NCEP SST analysis at t
  diagbuf(17) = Physical SST retrieval
  diagbuf(18) = Navy SST retrieval
  diagbuf(19) = d(ta) corresponding to sstph
  diagbuf(20) = d(qa) corresponding to sstph
  diagbuf(21) = data type
endif
  diagbuf(22) = vegetation fraction
  diagbuf(23) = snow depth
  diagbuf(24) = surface wind speed (m/s)
!   Note:  The following quantities are not computed for all sensors
  if (.not.microwave) then
     diagbuf(25)  = cloud fraction (%)
     diagbuf(26)  = cloud top pressure (hPa)
  else
     diagbuf(25)  = cloud liquid water (kg/m**2)
     diagbuf(26)  = total column precip. water (km/m**2)
  endif

  diagbuf(27) = foundation temperature: Tr
  diagbuf(28) = diurnal warming: d(Tw) at depth zob
  diagbuf(29) = sub-layer cooling: d(Tc) at depth zob
  diagbuf(30) = d(Tz)/d(Tr)
\end{verbatim}
\end{scriptsize}

\textit{diagbufchan} include loop through channel \textit{i} from 1 to \textit{nchanl}:

\begin{scriptsize}
\begin{verbatim}  
    diagbufchan(1,i)= observed brightness temperature (K)
    diagbufchan(2,i)= observed - simulated Tb with bias corrrection (K)
    diagbufchan(3,i)= observed - simulated Tb with no bias correction (K)
    diagbufchan(4,i)= inverse observation error
    diagbufchan(5,i)= quality control mark or event indicator
    diagbufchan(6,i)= surface emissivity
    diagbufchan(7,i)= stability index
    diagbufchan(8,i)= d(Tb)/d(Ts)
    do j=1,npred+1
       diagbufchan(7+j,i)= Tb bias correction terms (K)
    end do
\end{verbatim}
\end{scriptsize}

In the sample output file \textit{results\_amsua\_n18\_ges}, only the observation location and time are written in the file. Users can write out other information based on the list. 

%-------------------------------------------------------------------------------
\section{Read and Plot Convergence Information from fort.220}
%-------------------------------------------------------------------------------

In section 4.6, we introduced how to check the convergence information in the \textit{fort.220} file.  Further detail on the \textit{fort.220} convergence information can be found in the Advanced User\textquotesingle s Guide. Here, we provide tools to filter this file and to plot the values of the cost function and the norm of gradient during each iteration.

These tools - one ksh script and one ncl script - are in:
\textit{./util/Analysis\_Utilities/plot\_cost\_grad} directory:

The ksh script, \textit{filter\_fort220.ksh}, only has one line:
\begin{tiny}
\begin{verbatim}  
grep 'cost,grad,step,b' fort.220 | sed -e 's/cost,grad,step,b,step? =   //g' | sed -e 's/good//g' > cost_gradient.txt
\end{verbatim}
\end{tiny}

To run \textit{filter\_fort220.ksh}, the \textit{fort.220} needs to be in the same directory. The script will filter out the values of the cost function and the norm of gradient at each iteration from 
\textit{fort.220} into a text file called \textit{cost\_gradient.txt}. \\

Once the file \textit{cost\_gradient.txt} is ready, run ncl script \textit{GSI\_cost\_gradient.ncl} to generate the plot:
\begin{scriptsize}
\begin{verbatim}  
ncl GSI_cost_gradient.ncl
\end{verbatim}
\end{scriptsize}

The pdf file \textit{GSI\_cost\_gradient.pdf} is created. The pdf file contains plots of the convergence of the GSI analysis like those in section 4.6.

%-------------------------------------------------------------------------------
\section{Plot Single Observation Test Result and Analysis Increment}
%-------------------------------------------------------------------------------

In Section 4.2, we introduced how to do a single observation test for GSI. Here we provide users with the ncl scripts to plot the single observation test results.\\

There are 5 ncl scripts in the \textit{util/Analysis\_Utilities/plots\_ncl} directory:
\begin{table}[htbp]
\centering
\caption{the list of ncl plotting tools}
\begin{tabular}{|p{5cm}|p{9cm}|}
\hline
\hline
GSI\_singleobs\_arw.ncl&Plot single observation test with ARW NetCDF background\\
\hline
GSI\_singleobs\_nmm.ncl&Plot single observation test with NMM NetCDF background\\
\hline
Analysis\_increment.ncl&Plot analysis increment from the case with ARW NetCDF background\\
\hline
Analysis\_increment\_nmm.ncl&Plot analysis increment from the case with NMM NetCDF background\\
\hline
fill\_nmm\_grid2.ncl&	E grid to A grid convertor\\
\hline
\end{tabular}
\label{taba1}
\end{table} 

The main difference between the ARW and NMM core used in GSI is that ARW is on a C grid, while NMM is on an E grid. \textit{GSI\_singleobs\_nmm.ncl} calls \textit{fill\_nmm\_grid2.ncl} to convert the E grid to an A grid for plotting, while \textit{GSI\_singleobs\_arw.ncl} itself includes a C grid to A grid convertor.

Before running ncl scripts, users need to set up two links:
\begin{itemize}
\item \textit{cdf\_analysis}  - Link to analysis result in NetCDF format (\textit{wrf\_inout})
\item \textit{cdf\_bk} - Link to background file in netCDF format
\end{itemize}
These scripts read in the analysis and background fields of temperature (T), U component of wind (U), V component of wind (V), and moisture (Q) and calculate the difference of the analysis field minus the background field. Then XY sections (left column) and XZ sections (right column) are plotted for T, U, V, and Q through the point that has maximum analysis increment of single observation. Here the default single observation test is T. If the user conducts other single observation tests, the corresponding changes should be made based on the current scripts.\\

The scripts \textit{Analysis\_increment.ncl} and \textit{Analysis\_increment\_nmm.ncl} are very similar the one for the single observation but only the XY section for a certain analysis level is plotted.\\

For more information on how to use ncl, please check the NCL website at:
\begin{scriptsize}
\begin{verbatim}  
http://www.ncl.ucar.edu/
\end{verbatim}
\end{scriptsize}

 
%-------------------------------------------------------------------------------
\section{Generate initial regional ensembles }
%-------------------------------------------------------------------------------

Under the \textit{./util/EnKF} directory, there are two sub-directories:\textit{ enspreproc\_regional.fd/} and \textit{initialens\_regional.fd/}. The first one is to extract ensemble pertubations from GDAS 80 member ensembles and the second one is to add the extracted ensembles to a regional WRF background field (considered as the mean filed) to generate initial regional ensembles.

Before using these two unitilies, you should have already sucessfully compiled the GSI and gotten the "gsi.exe" file. After that, enter each of the two directory, type "make" to compile the utilities. A sucessful compilation should yield "enspreproc.exe" and "initialens.exe" respectively.

Now, the next step is to get GDAS spectrally smoothed atmospheric ensemble forecasts. These files should be in the sigma format, which is currrenlty the only format supported by "enspreproc.exe". You need to contact NCEP or other appropriate contacts to download these kind of ensembles. These ensemble files follow the name convection of "sfg\_\$CDATE\_fhr\$FEs\_mem\$MEM". \$CDATE is the cycle date, such as 2017011518 which means 18z of Jan. 15th, 2017. \$FE is the forecast hour, for example, 06 means 6 hour of forecasts. \$MEM is the member number. Here is an example of GDAS ensmbles: \textit{sfg\_2017011518\_fhr06s\_mem001}. 

After you download the required GDAS ensembles, follow the following steps:

1. Running "enspreproc.exe", enter the \textit{enspreproc\_regional.fd/} directory:

(1). generate the file "fileslist01". This file lists the ensemble files to be used in the calculation of ensemble pertubations. For example, if it is determined to use 20 members to generate ensemble perturbations, the file "filelist01" will be as follows:
\begin{scriptsize}
\begin{verbatim}
           sfg_2017011518_fhr06s_mem001
           sfg_2017011518_fhr06s_mem002
           sfg_2017011518_fhr06s_mem003
           ...
           sfg_2017011518_fhr06s_mem018
           sfg_2017011518_fhr06s_mem019
           sfg_2017011518_fhr06s_mem020
\end{verbatim}
\end{scriptsize}

(2). Modify the file "namelist.input", change "n\_ens" to the total number of ensembles to be used.

(3). Copy the "anavinfo" file used by GSI into current directory.

(4). Copy the background WRF file, name it as "wrf\_inout".

(5). Create a job description file, submit the job to get it run in parallel.

After the successful running of "enspreproc.exe", you will get ensemble perturbations as follows:
\begin{scriptsize}
\begin{verbatim}
           en_perts4ars.mem0001
           en_perts4ars.mem0002
           en_perts4ars.mem0003
           ...
           en_perts4ars.mem0018
           en_perts4ars.mem0019
           en_perts4ars.mem0020
\end{verbatim}
\end{scriptsize}

2. Runnning  "initialens.exe", enter the \textit{initialens\_regional.fd/} directory:

(1). Modify the file "namelist.input", change "n\_ens" to the total number of ensembles to be used.

(2). Copy wrf\_inout to current directry

(3). Copy wrf\_inout to wrfinput\_d01.mem\$MEM files as follows:
\begin{scriptsize}
\begin{verbatim}
           cp wrf_inout wrfinput_d01.mem0001
           cp wrf_inout wrfinput_d01.mem0002
           cp wrf_inout wrfinput_d01.mem0003
           ...
           cp wrf_inout wrfinput_d01.mem0018
           cp wrf_inout wrfinput_d01.mem0019
           cp wrf_inout wrfinput_d01.mem0020
\end{verbatim}
\end{scriptsize}

 Be sure that each member has a correspoding wrfinput\_d01 file. These files will be updated by "initialens.exe" later. 

(4). Link the ensemble perturbations generated by "enspreproc.exe" to current directory. Something like this \textit{ln -s ../enspreproc\_regional.fd/en\_perts4arw.mem*}. 

(5). Create a job description file, submit the job to get it run in parallel. Please note that only 1 processor is required to run "initialens.exe" but submitting it to run on computing node is a must.

After the sucessful running of "initialens.exe", all the \textit{wrfinput\_d01.mem\$MEM} files are updated with ensemble perturbation added to the background or "mean" state of the original wrf\_inout.

Now the initial regionl ensembles have been sucessfully generated.



